qsar model
Evaluation Framework for AI-driven Molecular Design of Multi-target Drugs: Brain Diseases as a Case Study
Cerveira, Arthur, Kremer, Frederico, Lourenço, Darling de Andrade, Corrêa, Ulisses B
The widespread application of Artificial Intelligence (AI) techniques has significantly influenced the development of new therapeutic agents. These computational methods can be used to design and predict the properties of generated molecules. Multi-target Drug Discovery (MTDD) is an emerging paradigm for discovering drugs against complex disorders that do not respond well to more traditional target-specific treatments, such as central nervous system, immune system, and cardiovascular diseases. Still, there is yet to be an established benchmark suite for assessing the effectiveness of AI tools for designing multi-target compounds. Standardized benchmarks allow for comparing existing techniques and promote rapid research progress. Hence, this work proposes an evaluation framework for molecule generation techniques in MTDD scenarios, considering brain diseases as a case study. Our methodology involves using large language models to select the appropriate molecular targets, gathering and preprocessing the bioassay datasets, training quantitative structure-activity relationship models to predict target modulation, and assessing other essential drug-likeness properties for implementing the benchmarks. Additionally, this work will assess the performance of four deep generative models and evolutionary algorithms over our benchmark suite. In our findings, both evolutionary algorithms and generative models can achieve competitive results across the proposed benchmarks.
QComp: A QSAR-Based Data Completion Framework for Drug Discovery
Yang, Bingjia, Chung, Yunsie, Yang, Archer Y., Yuan, Bo, Yu, Xiang
In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process.
Exploring QSAR Models for Activity-Cliff Prediction
Dablander, Markus, Hanser, Thierry, Lambiotte, Renaud, Morris, Garrett M.
Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that quantitative structure-activity relationship (QSAR) models struggle to predict ACs and that ACs thus form a major source of prediction error. However, a study to explore the AC-prediction power of modern QSAR methods and its relationship to general QSAR-prediction performance is lacking. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. We observe low AC-sensitivity amongst the tested models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance. Our results provide strong support for the hypothesis that indeed QSAR methods frequently fail to predict ACs. We propose twin-network training for deep learning models as a potential future pathway to increase AC-sensitivity and thus overall QSAR performance.
Multi-Task Deep Neural Networks for Ames Mutagenicity Prediction
The Ames mutagenicity test constitutes the most frequently used assay to estimate the mutagenic potential of drug candidates. While this test employs experimental results using various strains of Salmonella typhimurium, the vast majority of the published in silico models for predicting mutagenicity do not take into account the test results of the individual experiments conducted for each strain. Instead, such QSAR models are generally trained employing overall labels (i.e. Recently, neural-based models combined with multi-task learning strategies have yielded interesting results in different domains, given their capabilities to model multi-target functions. In this scenario, we propose a novel neural-based QSAR model to predict mutagenicity that leverages experimental results from different strains involved in the Ames test by means of a multi-task learning approach.
Semi-Supervised GCN for learning Molecular Structure-Activity Relationships
Ragno, Alessio, Savoia, Dylan, Capobianco, Roberto
Since the introduction of artificial intelligence in medicinal chemistry, the necessity has emerged to analyse how molecular property variation is modulated by either single atoms or chemical groups. In this paper, we propose to train graph-to-graph neural network using semi-supervised learning for attributing structure-property relationships. As initial case studies we apply the method to solubility and molecular acidity while checking its consistency in comparison with known experimental chemical data. As final goal, our approach could represent a valuable tool to deal with problems such as activity cliffs, lead optimization and de-novo drug design.
On Outliers and Activity CliffsWhy QSAR Often Disappoints
Quantitative structure activity relationships (QSAR) have been around for many years and have been employed in numerous fields from drug design to environmental toxicology. Countless papers have been written employing a wide variety of descriptors and computational methods in order to determine them. Nevertheless, while the jury is still out, it is safe to say that QSAR have not generally lived up to expectations, especially in cases where they are applied to data sets determined after the QSAR models were constructed. But this is true even in many "typical cases" where all of the data are known beforehand and are divided into training and test sets in order to construct and validate a model. Certainly the number of parameters available for use in QSAR models is sufficiently large and diverse to ensure reasonable predictions of bioactivity.
Deep reinforcement learning for de novo drug design
We have devised and implemented a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). On the basis of deep and reinforcement learning (RL) approaches, ReLeaSE integrates two deep neural networks--generative and predictive--that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular-input line-entry system (SMILES) strings only. Generative models are trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo–generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the RL approach to bias the generation of new ...
Application of generative autoencoder in de novo molecular design
Blaschke, Thomas, Olivecrona, Marcus, Engkvist, Ola, Bajorath, Jürgen, Chen, Hongming
A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the trainings set were identified.